COSINE: non-seeding method for mapping long noisy sequences
نویسندگان
چکیده
Third generation sequencing (TGS) are highly promising technologies but the long and noisy reads from TGS are difficult to align using existing algorithms. Here, we present COSINE, a conceptually new method designed specifically for aligning long reads contaminated by a high level of errors. COSINE computes the context similarity of two stretches of nucleobases given the similarity over distributions of their short k-mers (k = 3-4) along the sequences. The results on simulated and real data show that COSINE achieves high sensitivity and specificity under a wide range of read accuracies. When the error rate is high, COSINE can offer substantial advantages over existing alignment methods.
منابع مشابه
Second order statistics spectrum estimation method for robust speech recognition
A second order statistics spectrum estimation (SOSSE) method for speech enhancement is presented. DFT amplitude spectral components of noisy signal are assumed to be random values. Upon first and second order statistic values estimation of noise-only spectrum, an enhancement of noisy signal spectrum was performed. As a reference, a fast discrete cosine transform based signal subspace (FDCTSS) m...
متن کاملPhylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467
Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...
متن کاملDesign and Simulation of a Modified 32-bit ROM-based Direct Digital Frequency Synthesizer on FPGA
This paper presents a modified 32-bit ROM-based Direct Digital Frequency Synthesizer (DDFS). Maximum output frequency of the DDFS is limited by the structure of the accumulator used in the DDFS architecture. The hierarchical pipeline accumulator (HPA) presented in this paper has less propagation delay time rather than the conventional structures. Therefore, it results in both higher maximum ope...
متن کاملروشی جدید برای تفکیک و طبقهبندی توالیهای سرطانی و غیرسرطانی DNA با استفاده از الگوریتمهای مبتنی بر LPC و SVD
The growing pace of cancer has encouraged researchers to deliberate several aspects of this malignant disease. Genetic-induced nature of cancer, heighten the importance of studying intra-cell components. This paper has been carried out with the aim of making some specific and unique features clear from those long DNA sequences by employing well-established DNA sequence analysis techniques. The ...
متن کاملP87: The Role of the Long Non-Coding RNA Sequences (LncRNAs) in Neurological Disorders
Precise interpretation of the transcriptome sequences in the several species showed that the major part of genome has been transcribed; however, just a few amounts of the transcription sequences have open-reading frames which are conversed during the evolution. So, it is unlikely that many of the transcribed sequences code the proteins. Among the all human non-coding transcripts, at least 10000...
متن کامل